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Abstract 

In communications, unknown variables are usually modelled as random variables, and concepts such as indepen- 
dence, entropy and information are defined in terms of the underlying probability distributions. In contrast, control 
theory often treats uncertainties and disturbances as bounded unknowns having no statistical structure. The area 
of networked control combines both fields, raising the question of whether it is possible to construct meaningful 
CN analogues of stochastic concepts such as independence, Markovness, entropy and information without assuming 

a probability space. This paper introduces a framework for doing so, leading to the construction of a maximin 
information functional for nonstochastic variables. It is shown that the largest maximin information rate through 
a memoryless, error-prone channel in this framework coincides with the block-coding zero-error capacity of the 
channel. Maximin information is then used to derive tight conditions for uniformly estimating the state of a linear 



o 



time-invariant system over such a channel, paralleling recent results of Matveev and Savkin. 
^ Index Terms 

^ Nonprobabilistic information theory, zero-error capacity, erroneous channel, state estimation. 

>H 

^ 1. INTRODUCTION 

Q This paper has tw^o motivations. The first arises out of the analysis of netw^orked control systems [2], w^hich com- 
bine the tw^o different disciplines of communications and control. In communications systems, unknow^n quantities 
are usually modelled as random variables (rv's), and central concepts such as independence, Markovness, entropy 
and Shannon information are defined stochastically. One reason for this is that they are generally prone to electronic 

^ circuit noise, w^hich obeys physical law^s yielding w^ell-defined distributions. In addition, communication systems 
are often used many times, and in everyday applications each phone call and data byte may not be important. 
Consequently, the system designer need only ensure good performance in an average or expected sense - e.g. small 

CS| bit error rates and large signal-to-noise average powder ratios. 

^ In contrast, control is often used in safety- or mission-critical applications w^here performance must be guaranteed 
^ every time a plant is used, not just on average. Furthermore, in plants that contain mechanical and chemical 

components, the dominant disturbances may not necessarily arise from circuit noise, and may not follow^ a w^ell- 
• fh defined probability distribution. Consequently, control theory often treats uncertainties and disturbances as bounded 
rS unknow^ns or signals w^ithout statistical structure. Netw^orked control thus raises natural questions of w^hether it is 
C3 possible to construct useful analogues of the stochastic concepts mentioned above, without assuming a probability 

space. 

Such questions are not new^ and some answ^ers are available. For instance, if an rv has know^n range but unknow^n 
distribution, then its uncertainty may be quantified by the logarithm of the cardinality or Lebesgue measure of this 
range. This leads to the notions of Hartley entropy Hq for discrete variables and Renyi differential Oth-order 
entropy ho [4] for continuous variables. A related construction is the e-entropy, w^hich is the log-cardinality of the 
smallest partition of a given metric space such that each partition set has diameter no greater than e > 0, |[6]|, 
Q. None of these concepts require any statistical structure. 

Using these notions, nonstochastic measures of information can be constructed. For instance, in IH the difference 
betw^een the marginal and w^orst-case conditional Renyi entropies w^as taken to define a nonstochastic, asymmetric 
information functional, and used to study feedback control over errorless digital channels. In |9|, information 
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transmission was defined symmetrically as the difference between the sum of the marginal and the joint Hartley 
entropies of a pair of discrete variables. Continuous variables with convex ranges admitted a similar construction, 
but with Ho replaced by a projection-based, isometry-invariant functional. Although both these definitions possess 
many natural properties, their wider operational relevance is unclear. This contrasts with Shannon's theory, which 
is intimately connected to quantities of practical significance in engineering, such as the minimum and maximum 
bit-rates for reliable compression and transmission IIIOI . 

The second, seemingly unrelated motivation comes from the study of zero-error capacity Co ifTTIl . |[T2ll in 
communications. The zero-error capacity of a stochastic discrete channel is the largest block-coding rate possible 
across it that ensures zero probability of decoding error. This is a more stringent concept than the (ordinary) capacity 
C |[TOl , defined to be the highest block-coding rate such that the probability of a decoding error is arbitrarily small. 
The famous channel coding theorem |10| states that the capacity of a stochastic, memory less channel coincides 
with the highest rate of Shannon information across it, a purely intrinsic quantity. In |[T3]| , an analogous identity 
for Co was found in terms of the Shannon entropy of the 'largest' rv common to the channel input and output. 
However, it is known that Co does not depend on the values of the non-zero transition probabilities in the channel 
and can be defined without any reference to a probabilistic framework. This strongly suggests that Co should be 
expressible as the maximum rate of a suitably defined nonstochastic information index. 

This paper has four main contributions. In section [Il| a formal framework for modelling nonstochastic uncertain 
variables (uv's) is proposed, leading to analogues of probabilistic ideas such as independence and Markov chains. 



In section the concept of maximin information I* is introduced to quantify how much the uncertainty in one 
uv can be reduced by observing another. Two characterizations of I* are given here, and shown to be equivalent. 
In section llV| the notion of an error-prone, stationary memoryless channel is defined within the uv framework, 



and it is proved in Theorem |4.1| that the zero-error capacity Co of any such channel coincides with the largest 
possible rate of maximin information across it. Finally, it is shown in section |V] how I* can be used to find a 
tight condition (Theorem |5.1| ) that describes whether or not the state of a noiseless linear time-invariant (LTI) 
system can be estimated with specified exponential uniform accuracy over an erroneous channel. A tight criterion 
for the achievability of uniformly bounded estimation errors is also derived for when uniformly bounded additive 
disturbances are present (Theorem |5.2| ); a similar result was derived in [[T4| . using probability arguments but no 
information theory. In a nonstochastic setting, maximin information thus serves to delineate the limits of reliable 
communication and LTI state estimation over error-prone channels. 

II. Uncertain Variables 

The key idea in the framework proposed here is to keep the probabilistic convention of regarding an unknown 
variable as a mapping X from some underlying sample space to a set X of interest. For instance, in a dynamic 
system each sample CO e may be identified with a particular combination of initial states and exogenous noise 
signals, and gives rise to a realization X{co) denoted by lower-case x G X. Such a mapping X is called an uncertain 
variable (uv). As in probability theory, the dependence on CO is usually suppressed for conciseness, so that a 
statement such as X G K means X{o)) G K. However, unlike probability theory, the formulation presented here 
assumes neither a family of measurable subsets of Q., nor a measure on them. 

Given another uv Y taking values in Y, write 

m ■■= {X{(o):(oe^}, (1) 
lX\yj := {X{(o):Y{co)=y,COea}, (2) 
IX,Y} := {{X{(o),Y{(o)):o)ea}. (3) 

Call {Xj the marginal range of X, lX\y} its conditional range given (or range conditional on) Y = y, and 
the joint range of X and Y. With some abuse of notation, denote the family of conditional ranges ([2]) as 

lX\Y\:={lX\y\:y^lY\}, (4) 

with empty sets omitted. In the absence of stochastic structure, the uncertainty associated with X given all possible 
realizations of Y is described by the set- family Notice that Ubg|x|f]B = PI, i.e. \X\Y\ is an p]-c6>ver. In 

addition, 

[X,7]= U I^blxM, (5) 
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a) X,y related 



b)X,y unrelated 



Fig. 1. Examples of joint and marginal ranges for related and unrelated uv's. 



i.e. the joint range is fully determined by the conditional and marginal ranges in a manner that parallels the 

relationship between joint, conditional and marginal probability distributions. 

Using this basic framework, a nonstochastic analogue of statistical independence can be defined: 

Definition 2.1 (Unrelatedness): A collection of uncertain variables Fi,...,F^ is said to be (unconditionally) 

unrelated if 

[Fi,...,U = [Filx..-x[F^]. 
They are said to be conditionally unrelated given (or unrelated conditional on) X if 

lYi,.. .,Y^\xj = [yi |xl X • • • X {Y^lxj, X G [XI. 

Like independence, unrelatedness has an alternative characterization in terms of conditioning: 
Lemma 2.1: Given uncertain variables X,F,Z, 



a) F, Z are unrelated (Definition |2.1| ) iff the conditional range 

[F|z] = [Fl, ze[Z]. 

b) F,Z are unrelated conditional on X iff 

[7|z,x] = [F|xl, (z,x)G[Z,Xl. 

Proof: Trivial. □ 

Example: Figure [T^) illustrates the case of two related uv's X and Y. Observe that the joint range |X,F] is 
strictly contained in the Cartesian product \X\ x |F] of marginal ranges. In addition, for some values x' G \X\ 
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and y G the conditional ranges lX\y} and |F|x^] are strictly contained in the marginal ranges {Xj and |F], 
respectively. 

In contrast, Figure [TJd) depicts the ranges when X and F are unrelated. The joint range now coincides with 
{Xj X and lX\yj and coincide with {Xj and [F] respectively, for every G [X] and / G [Fj. 

It is to see that for any uv's X,Fi, . . . ,F^, 

lX\yu...,y4ClX\yijn---nlX\y^l (6) 

for all {yi)^i G |Fi] x • • • [F^J and / G [1 : m]. Equality is possible under extra hypotheses: 

Lemma 2.2: Let X, Fi , . . . , F^ be uncertain variables s.t. Fi , . . . , F^ are unrelated conditional on X (Definition |2.1| ). 
Then V(3;i,...,3;^)G[Fi,...,F^l, 

lX\y,,...,ym} = lX\yl}f^'••f^lX\yml (7) 

Proof: See appendix [A| □ 

The second item in Lemma |2.1| motivates the following definition: 



Definition 2.2 (Markov Uncertainty Chains): The uncertain variables X, Y and Z are said to form a Markov 



uncertainty chain X ^ F ^ Z if X,Z are unrelated conditional on F (Definition |2.1| ). 



Remarks: By the symmetry of Definition |2.1[ Z ^ F ^ X is also a Markov uncertainty chain. 

Before closing this section, it is noted that the framework developed above is not equivalent to treating input 
variables with known, bounded ranges as uniformly distributed rv's. Such an approach is still probabilistic, and the 
output rv's may have nonuniform distributions despite the uniform inputs. In contrast, in the uv model here, only 
the ranges are considered, and no distributions are derived at any stage. 

For instance, consider an additive bounded noise channel with output Y = X+N, where the input X and noise 
N range on the interval [—0.5,0.5]. If X and N are taken to be mutually independent, uniform rv's, then F has a 
triangular distribution on [—1,1], with small values of F more probable than large ones. However, if X and N are 
treated as unrelated uv's, then all that can be inferred about F is that it has range [—1,1], with all values in this 
range being equally possible. 

Naturally, this lack of statistical structure does not suit all applications. However, as discussed in section |l| such 
structure is often excess to requirements, e.g. in problems with worst-case objectives and bounded variables as in 
section |Vj A uv-based approach is arguably more natural in these settings. 



III. Maximin Information 

The framework introduced above is now used to define a nonstochastic analogue I* of Shannon's mutual 
information functional. Two characterizations of I* are developed and shown to be equivalent (Definition |3.2| and 



Corollary 3.1). 



Throughout this section, X,F are arbitrary uncertain variables (uv's) with marginal ranges |X] and |F] ([T]), 
joint range p,F] ([3]), and conditional range family \X\Y\ Q. Set cardinality is denoted by | • |, with the value oo 
permitted, and all logarithms are to base 2. 



A. Previous Work 

It is useful to first recall the nonprobabilistic formulations of entropy and information mentioned in section |l| 
Though originally defined in different settings, for the sake of notational coherence they are discussed here using 
the uv framework of section [III 

In loose terms, the entropy of a variable quantifies the prior uncertainty associated with it. For discrete-valued 
X, this uncertainty may be captured by the (marginal) Hartley or 0-entropy 

Ho[X]:=log|[X]|e[0,oo], (8) 
If \X\ has Lebesgue measure on M", then the (marginal) Renyi differential 0-entropy is defined as 



ho[X] :=logAtMG[-oc,oo]. 



(9) 
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A related construction is the e-entropy, which is the log-cardinality of the smallest partition of a given metric space 
such that each partition set has diameter no greater than e > |l5|. None of these concepts require a probability 
space. 

Two distinct notions of information have been proposed based on the 0-entropies above. In O, a worst-case 
approach is taken to first define the (conditional) 0- entropy ofX given Y as 

no[X\Y] :=ess sup log|[X|);l| G [O-]. (10) 

If every set in the family is /i-measurable on R^, then the (conditional) differential 0-entropy ofX given Y 

is 

ho[X\Y] :=ess sup log^PW G [-00,00]. (11) 

Noting that Shannon information can be expressed as the difference between the marginal and conditional entropies, 
a nonstochastic 0-information functional Iq is then defined in |8| as 

I„|X;yl :=H„[Xl-H„|X|r] .ess ■„nog(iia (12) 

if X is discrete-valued with Ho[X|F] < oo, and as 

UX-.Y] HX] - HX\y] = ess log ( J^p) (13) 

if X is continuous- valued with ho[X;F] < oo. In other words, the 0-information that can be gained about X from Y 
is the worst-case log-ratio of the prior to posterior uncertainty set sizes|^ 

The definition above is inherently asymmetric, i.e. Io[X;F] 7^Io[F;X]. A different and symmetric nonstochastic 
information index had been previously proposed in |9|. In that formulation, a conditional entropy was first defined as 
the difference between the joint and marginal Hartley entropies, in analogy with Shannon's theory. The information 
transmission T[X;Y] was then defined as the difference between the marginal and conditional entropies, yielding 
the symmetric formula 

T[X; F] := Ho [X] + Ho [Y] - Ho [X, F] . 

Continuous variables with convex ranges admitted a similar construction, with Ho replaced not with ho but a 
projection-based, isometry-invariant functional. 

Though these concepts are intuitively appealing and share some desirable properties with Shannon information, 
they have two weaknesses. Firstly, they do not treat continuous- and discrete-valued uv's in a unified way. In 
particular, it is unclear how to apply the approach of |[9l to mixed pairs of variables, e.g. a digital symbol encoding 
a continuous state, or to continuous variables with nonconvex ranges. 

Secondly and more importantly, their operational relevance for problems involving communication has not been 
generally established. While the worst-case log-ratio approach of [8| has been used to find minimum bit rates for 
stabilization over an errorless digital channel, it is not obvious how to apply it if transmission errors occur. 

For these reasons, an alternative approach is pursued in the remainder of this section. 

B. via the Overlap Partition 

The nonstochastic information index I* proposed in this subsection quantifies the information that can be gained 
about X from F in terms of certain structural properties of the family |X|F] of posterior uncertainty sets. These 
properties are described below: 

Definition 3.1 (Overlap Connectedness/Isolation): 

a) A pair of points x and G {Xj is called lX\Y}-overlap connected, denoted x x\ if 3 a finite sequence 
{I^b/llLi conditional ranges such that x G [X|3;i], G lX\ynl and each conditional range has nonempty 
intersection with its predecessor, i.e. lX\yil H p|3;/_i] / 0, for each / G [2, . . . ^n]. 

^Note that in 1965, Kolmogorov had defined log|[[X|j]]| as a 'combinatorial' conditional entropy and the log-ratio log ( | [X] | / 1 JX | | ) as a 
measure of information gain. However, these quantities have the defect of depending on the observed value Y = y, and thus are associated 
with a specific posterior uncertainty set p|_y]. In contrast, (To|)-(T3]l and |T6| are functions of the family of all possible posterior 

uncertainty sets. 
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b) A set A C |X] is called |X|F] -overlap connected if every pair of points in A is overlap connected. 

c) A pair of sets A,B is called lX\Y}-overlap isolated if no point in A is overlap connected with any point in 
B. 

d) An |X|F]-6>verto;? isolated partition (of {X}) is a partition of {Xj where every pair of distinct member-sets 
is overlap isolated. 

e) An lX\Y}-overlap partition is an overlap-isolated partition each member-set of which is overlap connected. 

Remarks: For conciseness, the qualifier will often dropped when there is no risk of confusion about the 

conditional range family of interest. Note that any point or set is automatically overlap connected with itself. In 
addition, ^ lies in the same overlap partition set as x iff x. 

Symmetry and transitivity guarantee that a unique overlap partition always exists: 



Lemma 3.1 (Unique Overlap Partition): There is a unique p|F]-overlap partition of {Xj (Definition 3.1), de- 
noted |X|F]x.. Every set C G |X|F]* is expressible as 

C = {xG [XI :x<^C}= U B. (14) 

Furthermore, every |X|F] -overlap isolated partition ^ of {Xj satisfies 

1=^1 < 11^1^^1*1, (15) 

with equality iff ^ = p|Fl*. 
Proof: See appendix [B] □ 

Remarks: The self-referential identities in ([14]) are needed to prove certain key results later. The first equality 
says that each element C of the overlap partition coincides with the set of all points that are overlap connected 
with it. The second states that every such C is expressible as a union of elements of the set family |X|F]. 

Observe that from Definition |3.1[ overlap-isolated partitions are precisely those partitions ^ of {Xj with the 
property that every conditional range {Xlyj lies entirely inside one member set P G In other words, each possible 
observation 3; G |F] unambiguously identifies exactly one partition set P containing x. Equivalently, these partition 
sets can be thought of as defining a discrete- valued function, or quantizer, on {Xj. The more sets there are in 
the more distinct values this quantizer can take, and so the more refined the knowledge that can be unequivocally 
gained about X. 

By the result above, |X|F]* is precisely the overlap-isolated partition of maximum cardinality. This leads naturally 
to the definition below: 

Definition 3.2: The maximin information between X and F is defined as 

I.[X;F]:=log||X|F]*|, (16) 

where is the unique -overlap partition of [X] (Lemma \Ta\ . 

<> 

Remarks: By the discussion above, I*[X;F] represents the most refined knowledge that can be gained about X 
from observations of F. Note that this definition applies to both continuous- and discrete-valued uv's. Also note 
that the self-information I*[X;X] is identical to Ho[X]. 

Example: Consider uv's X and F with the one-dimensional conditional range family [X|F] = : / = 1, . . . ,5} 

and overlap partition |X|F]* = {Pi,P2} depicted in Figure [2] Observe that any pair of points in Pi or in P2 is 
overlap connected, and no point in Pi is overlap connected to a point in P2. Also note that {Pi,P2} is the finest 
partition of {Xj having member sets that can always be unambiguously determined from F; a partition with larger 
cardinality would necessarily contain two or more neighbouring partition sets intersected by the same posterior set 
and the observation F =yi would then correspond to either partition set. Thus the maximin information 
between X and F is log|p|Fl*| = log2 = 1 bit. 

It is easy to verify that I* 7^ Iq. 

Example: Let X and Z be unrelated uv's with {Xj = {0, 1} and [Z| = {0, 1}, and define the uv F by F = X if 
Z = and F = 2 if Z = 1. The family lY\Xj consists of the sets [F|Ol = {0,2} and |F|ll = {1,2}. The overlap 
partition |F|X]* has only one set, {0,1,2}, so I*[F;X] =logl =0. However Io[F;X] =log|, since the largest 
cardinality of sets in |F|X] is 2. 
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Fig. 2. Example of an overlap partition. 



Finally, note that I».[X;F] was originally defined in [IJ as 



sup min log 



IFnci 



where is the family of all finite subsets of [Xj; hence the name 'maximin' information. This log-ratio 

characterization is close in spirit to ([T2])-([T3]) and can be shown to be equivalent to ([16]). However, since it does 
not have as simple an interpretation as (26) and is not needed for any of the results here, there will be no further 
discussion of it in what follows. 



C. I* via the Taxicab Partition 

The definition of maximin information above is based purely on the conditional range family As lY\X} 

will not generally be the same, it may seem that I* could be asymmetric in its arguments. However, it turns out 
that I* can be reformulated symmetrically in terms of the joint range A few additional concepts are needed 

in order to present this characterization. 

Definition 3.3 (Taxicab Connectedness/Isolation): 

a) A pair of points (x,}^) and {x\y') G is called taxicab connected if there is a taxicab sequence connecting 
them, i.e. a finite sequence {(x/,3^/)}^=i of points in |X,F] such that (xi,3;i) = {x^y), {xn^yn) = ij') and 
each point differs in at most one coordinate from its predecessor, i.e. yi =yi-i and/or x/ =x/_i, for each 

/e [2,...,^]. 

b) A set A C |X,F] is called taxicab connected if every pair of points in A is taxicab connected in |X,F]. 
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([X,/] = shaded area) 
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b) Disconnected in taxicab 
a) Two points connected in taxicab, & usual senses c) Taxicab disconnected, 



but not usual, sense. 



but connected in usual sense 



Fig. 3. Path- vs. taxicab-connectedness 



c) A pair of sets A,B is called taxicab isolated if no point in A is taxicab connected in to any point in 
B. 

d) A taxicab-isolated partition (of\X^Y\) is a cover of such that every pair of distinct sets in the cover 
is taxicab isolated. 

e) A taxicab partition (of \X^Y\) is a taxicab-isolated partition of [X,F] each member-set of which is taxicab 
connected. 

Remarks: Note that any point or set is automatically taxicab connected with itself. In addition, taxicab connected- 
ness/isolation in p,F] is identical to that in [F,XJ, with the order of elements in each pair reversed. Consequently, 
any taxicab-isolated partition of [X,F] is in one-to-one correspondence with one of 

Taxicab-isolated partitions have the property that the particular member set T that contains a given point (x,y) 
is uniquely determined by x and by 3; alone. The argument is by contradiction: if x is associated with two sets 
T,T^ in the overlap-isolated partition, i.e. {x,y) G T and (x,/) G for distinct y,y' G [F], then T and would be 
taxicab-connected by the sequence ((x,};), (x,/)). In other words, the sets of a taxicab-isolated partition represent 
posterior knowledge that can always be agreed on by two agents who separately observe realizations of X and Y. 

Lemma 3.2 (Taxicab- <^ Overlap-Connectedness): Any two points {x^y)^{x' ^y') G are taxicab connected 



(Definition 3.3) iff x<-^x' in lX\Y\ (Definition 3.1). 



Thus any set AC is taxicab connected iff its x-axis projection A+ C |X] is overlap connected. 

Similarly, any two sets A,B C |X,F] are taxicab isolated (Definition 3.3) iff A+,B+ C |X] are overlap isolated 
(Definition |3l1). 

Proof: See appendix [C] □ 

Due to this equivalence between the two notions of connectedness, the same symbol is used. The result 
below makes another link: 
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Theorem 3.1 (Unique Taxicab Partition): There is a unique taxicab partition (Definition |3.3[ ) ^\X\Y] of p,F] 

©• 

In addition, every taxicab-isolated partition ^ of |X,F] satisfies 

\^\<\n^'j]i (17) 

with equality iff ^ = ^[X;F]. 

Furthermore, a one-to-one correspondence from 3^[X\Y] to the overlap partition |X|F]* (Lemma 3.1) is obtained 
by projecting the sets of the former onto \X\. 

Proof: See appendix [P] □ 

The last statement of this theorem leads immediately to the following alternative characterization of maximin 
information: 



Corollary 3.1 fl* via Taxicab Partition): The maximin information I* ([16]) satisfies the identity 

I*[X;F]=log|^[X;F]|, 

where ^[X;F] is the unique taxicab partition of p,F] (Theorem |3.1| ). 
Thus I*[X;F]=I*[F;X]. 





Remarks: From the discussion following Definition 3.3 the bound (17) means that 3^\X\Y] represents the 



finest posterior knowledge that can be agreed on from individually observing X and F. The log-cardinality of this 
partition has considerable intuitive appeal as an index of information. Indeed, if X and F are discrete rv's, then 
the elements of the taxicab partition correspond to the connected components of the bipartite graph that describes 
(x,}/) pairs with nonzero joint probability. In |13|, the Shannon entropy of these connected components was called 
zero-error information and used to derive an intrinsic but stochastic characterization of the zero-error capacity Co of 
discrete memoryless channels. Maximin information corresponds rather to the Hartley entropy of these connected 



components. In section |IV| it will be seen to yield an analogous nonstochastic characterization that is valid for 
discrete- or continuous-valued channels. 

Example: The shaded regions in Figure |4] depict the joint range p,F] of uv's X,F having the conditional range 
family \X\Y\ of Figure |2j The taxicab partition ^\X\Y] consists of the sets Ti and T2; it can be seen that every 
pair of points in each set is taxicab connected, and no point in one set is taxicab-connected with a point in the 
other. Projecting Ti and T2 onto p] yields Pi and P2, the sets comprising the overlap partition p|F]*. Similarly, 
|F|X]* consists of the projections Qi and Q2 of Ti and T2 onto |F]. 

If two agents observe X and F separately, then they will always be able to agree on the index Z G {1,2} of the 
unique taxicab partition set Tz that contains (^,F), since it is also the index of the overlap partition sets Pz and 
Qz that contain X and F respectively. The amount of information they share is then log |^[X;F]| = log ||X|F]*| = 
log|[F|Xl*| = l bit. 

D. Properties of Maximin Information 

Two important properties of maximin information are now established. These properties are also exhibited by 



Shannon information and will be needed to prove Theorem 5.1 



Lemma 3.3 (More Data Can't Hurt): The maximin information I* ([16]) satisfies 

I*[^;F]<I*[X;F,Z]. (18) 

Proof: By Definition [3lj every set C G p|F,Zl* is overlap connected in p|F,Zl. As lX\y,z\ C lX\y'\, C is also 
overlap connected in p|F]. Pick a set G |^|F].„ that intersects C. As is overlap connected in p|F], it also 



C. Thus C C since by (14) must include all points C^ Consequently, there is only one for each 

C. 

Furthermore, since |X|F,Z]* covers [X], every set of p|F].„ must intersect and thus include some of its set(s). 
Thus the map C C is a surjection from p|F,Zl* ^ [X|Fl*, implying that |p|F,Zl*| > |p|Fl*|. □ 

Lemma 3.4 (Data Processing): If X ^ F ^ Z is a Markov uncertainty-chain (Definition |2.2[ ), then the maximin 
information I* ( [T6| ) satisfies 

I*[X;Z] <I*[X;F]. (19) 
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Fig. 4. Taxicab and Overlap Partitions 



Proof: By Lemma 3.3 



I*[X;Z] < I* [X;F,Z] *iflog|p|F,Zl|*. 

By Definition [2^ l^\y^z\ = lX\y\ for every V3; G |Fl and z G |Z|3;1, so p|F,Zl* = p|Fl*. Substituting this into 
the RHS of the equation above and applying ([16]) again completes the proof. □ 

Remark: By the symmetry of Markov uncertainty chains and maximin information, I*[X;Z] < I*[F;Z]. 



E. Discussion 

Maximin information is a more conservative index than Shannon information /. For instance, Corollary |3.1| 
implies that unrelated uv's must share maximin information, but the converse does not hold, unlike the analogous 
case with Shannon information. This is because I*[X;F] is the largest cardinality of [X,F] -partitions such that the 
unique partition set containing any realization {x^y) can be determined by observing either x or 3; alone. Even if X 
and F are related, there may be no way to split the joint range into two or more sets that are each unambiguously 
identifiable in this way. 

Example: Let {XJ} = {(0,0), (0, 1), (1, 1)}. As {XJj / {Xj x {Yj = {0,1}^ X and F are related. However, 
every pair of points in |X,F] is taxicab-connected, so 3^[X',Y] has only one set, |X,F], and I*[X;F] ^^ySlQ 
also Figure [5] for other examples. 

This conservatism might suggest that I* could be derived from Shannon information via a variational principle, 
i.e. as 

inf{I[X;F] : XJ rv's with given support {XJ}}. 
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X,/ related, 
but L[X;/] = 



([X,/] = shaded area) 

y 



X,/ related, 
L[X;/] = log,2 = 1 



yA 



xy unrelated, 
L[X;/] = 



Fig. 5. Zero I* does not imply unrelatedness 



However, such an approach would be too conservative, since the infimum can be zero even when the maximin 
information is strictly positive. A formal proof of this is not given due to space constraints, but a sketch of the 
argument follows. Let ^ be a (suitably well-behaved) joint probability density function (pdf) that is strictly positive 
on the Lebesgue measurable support |X,F] of Figure 3(b) and that has finite Shannon information. Pick a point 
x' ^y' in the interior of the support and for any e and sufficiently small r > 0, let X,7 be rv's with joint pdf 
Px,Y = (1 — ^)wyy,r + ^^' where wyy^ is a uniform pdf wyy on a square of dimension r > centred at (x^/). 
Observe that if (r,e) = (0,0), the joint pdf becomes a unit delta function centred at (x^,/), which automatically 
yields zero mutual information. As I[X;F] must vary continuously with e,r > 0, it follows that ^0 as 

(e,r) (0,0). The nonnegativity of Shannon information then implies that the infimum above must be zero, but 
the maximin information remains 1. 

IV. Channels and Capacity 

In this section, a connection is made between maximin information and the problem of transmission over an 
erroneous, discrete-time channel. 

A. Stationary Memoryless Uncertain Channels 

Let be the space of all X- valued, discrete-time functions x : Z>o X. An uncertain (discrete-time) signal X 
is a mapping from the sample space to some function space ^ C X°° of interest. Confining this mapping to any 
time t G Z>o yields an uncertain variable (uv), denoted X{t). The signal segment {X{t))^^^ is denoted X{a : b). As 
with uv's, the dependence on CO ^Q. will not usually be indicated: thus the statements X e A and X{t) =x{t) mean 
that X{o)) G A and X{co){t) =x{t) respectively. Also note that {Xj here is a subset of the function space 

A nonstochastic parallel of the standard notion of a stationary memoryless channel in communications can be 
defined as follows: 
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Definition 4.1: Given an input function space X C X°° and a set- valued transition fiinction T : X ^ 2^, a 
stationary memoryless uncertain channel maps any uncertain input signal X with range \X\ C ^ to an uncertain 
output signal Y so that 

lY{Q:t)\x{Q:t)\ = T(x(0)) x • • • x T(x(0), 

x(0:?) e P(0:01,f eZ>o. (20) 

The set- valued reverse transition function R : Y — > 2^ of the channel is 

R(>'):={xeX:T(x)9y}, y G Y. (21) 

Remarks: The set-valued map T here plays the role of a time-invariant transition probability matrix or kernel 
in communications theory. The input function class ^ is included to handle possible constraints such as limited 
time-averaged transmission power or input run-lengths, though in the rest of this paper ^ is taken as X°°. 

The definition above implicitly assumes no feedback from the receiver back to the transmitter. If such feedback 
is present then by arguments similar to Massey's |15|, a more general definition must be used - see |[T6ll . 

The following lemma shows that the conditional range of the input sequence given an output sequence is defined 
by the reverse transition function and the unconditional input range. 



Lemma 4.1: Given a stationary memoryless uncertain channel (Definition 4.1) with reverse transition function 

p(0 : Ob(0 : 01 = 1^(0 : 01 nnR(3;(0), 3^(0 : ^ Y^+' (22) 

V ' 

=:R(j(0:0) 

and for any valid pair X,F of uncertain input and output signals. 
Proof: See appendix |E| □ 

The largest information rate across a channel is formally defined as follows: 



Definition 4.2: The peak maximin information rate of a stationary memoryless uncertain channel (Definition |4.1| ) 

.up ■•'X(0:.).nO:»l _ ,,3, 

where is the input function space and Y is the uncertain output signal yielded by the uncertain input signal X. 



It can be shown that the term under the supremum over time on the RHS is super-additive. A standard result 



called Fekete's lemma then states that the supremum over time on the RHS of ( [23] ) is achieved in the limit as 
t ^ oo. This leads immediately to the following identity: 



Lemma 4.2: For any stationary memoryless uncertain channel (Definition 4.1), the peak maximin information 



rate (23) satisfies 

„ ^. I*[X(0:0;^(0:0] 

R^ = hm sup ^, (24) 

where ^ is the input function space and Y is the uncertain output signal yielded by X. 
V 



B. Zero-Error Capacity 

It is next shown how relates to the concept of zero-error capacity Co iHTIl . |[T2ll . which Shannon introduced 
after its more famous sibling the (ordinary) capacity C [10|. As described in section |I| the zero-error capacity 
of a stochastic channel is defined as the largest average block-coding bit-rate at which input "messages" can be 
transmitted while ensuring that the probability of a decoding error is exactly zero (not just arbitrarily small, as 
with the usual capacity). It is well known that Co does not depend on the probabilistic nature of the channel, in 
the sense that the specific values of the nonzero transition probabilities play no role. This suggests that Co ought 
to be defineable using the nonstochastic framework of this paper. 
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To see this, observe that a length- (^ + 1) zero-error block code may be represented as a finite set F C X^+^, 
where each codeword / G A corresponds to a distinct "message". The average coding rate is thus (log|F|)/(^ + 1) 
bits/sample, under the constraint that any received output sequence y{0 : t) corresponds to at most one possible 
/. In other words t G Z>o, a set F C X^+^ of codewords is valid iff for each possible channel output sequence 
y{0 : /) G Y^+^, |FnR(3;(0 : ^))| < 1. Thus the zero-error capacity may be defined operationally as 



Co:= 



sup 



log|F| 



rGZ>o,FG#(X^+l) 

where the limit again follows from superadditivity and 

{Fg^(X'+^) :V3;(0:^ 



= lim sup 



log|F| 



(25) 



GY^+\|FnR(};(0:0)|<l}, 



(26) 



with ^(X^+^) the family of all finite subsets of X^+^ and R, the reverse block transition function (22). 

The main result of this section shows that Co admits an intrinsic characterization in terms of maximin information 
theory: 

Theorem 4.1 (Co via Maximin Information): For any stationary memory less uncertain channel with input func- 
tion space ^ = X' 
capacity Cq 



(Definition |4.1[ ), the peak maximin information rate (Definition |4.2| ) equals the zero-error 
Proof: As [X(0 : /)|^(0 : 01* is a partition of [X(0 : 01, 



|p(0:0|F(0:01. 



|22l 



sup |F| 

FG^|[X(0:01:VCG[X(0:0|i^(0:0,|FnC|<ll* 

sup |F| 

FG^|X(0:01:VBG|X(0:0|i'(0:0,|FnB|<l] 

sup |F| 

FG^|X(0:01:Vj(0:0GY^+MFnR(j(0:0)|<l 

sup |F|l*2^o(m)^ 

Fg#(X^+1) 



([T§J23| 

< Co . 



(27) 
(28) 



It is next shown that G Z>o, 3 a uv X(0 : t) for which ([27]) is an equality. For any F G ^(X^+^)J26]), let 
X{Q : be a surjection from Fj^Then no point in |X(0 : 01 = F is overlap connected (Definition 3.1) with 



any other, since at least one of the conditional ranges |X(0 : t)\y{0 : 01 overlap-connecting them would then have 2 
or more distinct points; this is impossible by (22) and ( [26| ). Thus the overlap partition p(0 : 01^(0 • 01* (Lemma 

singletons, comprising the individual points of |X(0 : 01 = F- 

F* forces the LHS of ^2n\ to coincide with 



JA\ of |X(0 : 01 is a family of |p(0 : Oil = |F| 

If #(X^+i) has a set F* of maximum cardinality, then choosing F ■ 



the RHS. Otherwise, the RHS of ( |27| ) will be infinite and F may be chosen to have arbitrarily large cardinality, 
again yielding equality in ( |27) ), by (16]). This achieves equality in ([28]). □ 

Remarks: This result shows that the largest average bit-rate that can be transmitted across a stationary memoryless 
uncertain channel with errorless decoding coincides with the largest average maximin information rate across it. 
This parallels Shannon's channel coding theorem for stochastic memoryless channels and arguably makes I* more 
relevant for problems involving communication than other nonstochastic information indices. 

It must be noted that ensuring exactly zero decoding errors is a stringent requirement and is impossible over many 
common channels, such as the the binary symmetric, binary erasure and additive white Gaussian noise channels, 
which have Cq = 0. However, a number of channels are known to possess nonzero Co, such as the pentagon and 
additive bounded noise channels. Zero-error capacity is also an object of study in graph theory, where it is related 
to the clique number. See l[T2ll for a comprehensive survey of the literature on Cq. 



^As in the mutual-information characterization of Shannon capacity, it is impUcit that the underlying sample space ^l is infinite, so that 
such a surjection always exists for each t G Z>o. 
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V. State Estimation of Linear Systems over Erroneous Channels 

In this section, maximin information is used to study the problem of estimating the states of a linear time-invariant 
(LTI) plant via a stationary memoryless uncertain channel (Definition 4.1), without channel feedback. First, some 
related prior work is discussed. 



A. Prior Work 

In the case where the channel is an errorless digital bit-pipe, the state estimation problem is formally equivalent 
to feedback stabilization with control inputs known to both encoder and decoder. The central result in this scenario 
is the so-called "data rate theorem", which states that the estimation error or plant state can be stabilized or taken to 
zero iff the sum H of the log-magnitudes of the unstable eigenvalues of the system is less than the channel bit-rate. 
This condition holds in both deterministic and probabilistic settings, and under different notions of convergence or 
stability, e.g. uniform, rth moment or almost surely (a.s.) Inl, 1181, Hll, |[2Ql, EB, lE3, 111. See also [24] for 
recent work on quantized estimation of stochastic LTI systems. 

However, if transmission errors occur, then the stabilizability and estimation conditions become highly dependent 
on the setting and objective, leading to a variety of different criteria. For instance, given a stochastic discrete 
memoryless channel (DMC) and a noiseless LTI system with random initial state, a.s. convergence of the state or 
estimation error to zero is possible if and (almost) only if the ordinary channel capacity C >H; this was proved 
for digital packet-drop channels with acknowledgements in ||25l , and for general DMC's with or without channel 
feedback in ll26ll . The same result also holds for asymptotic stabilizability via an additive white Gaussian noise 
channel (TT\, with no channel feedback. See also |28| for bounds on mean- square-error convergence rates for state 
estimation over stochastic DMC's, without channel feedback. 

Suppose next that additive stochastic noise perturbs the plant and the objective is to bound the rth moment of 
the states or estimation errors. Assuming channel feedback, bounded noise and scalar states, the achievability of 
this goal is determined by the anytime capacity of the channel 129]. Other related articles are [30], [|3H , [13211 - 
the first two consider moment stabilization over errorless channels with randomly varying bit-rates known to both 
transmitter and receiver, and the last studies mean-square stabilization via DMC's with no channel feedback. See 
also the recent papers |[33]| , |[34ll for explicit constructions of error-correcting codes for control. 

For the purposes of this section, the most relevant prior work is [14] (see also |35|), in which the channel 
is modelled as a stochastic DMC, and the plant is LTI with random initial state but is perturbed by additive 
nonstochastic bounded disturbances. It was shown that if channel feedback is absent, then a.s. uniformly bounded 
estimation errors are possible iff H <Co, the zero-error capacity fm of the channel. However, under perfect channel 
feedback the necessary and sufficient condition becomes H < Cof, the zero-error feedback capacity defined in [11]; 
the same criterion applies if the goal is to stabilize the plant states in the a.s. uniformly bounded sense, with or 
without channel feedback. As Co and Cof are (often strictly) less than C, both these conditions are more restrictive 
than for plants with stochastic or no process noise, even if the disturbance bound is arbitrarily small. In rough terms, 
the reason for the increased strictness is that nonstochastic disturbances do not enjoy a law of large numbers that 
averages them out in the long run. As a result it becomes crucial for no decoding errors to occur in the channel, not 
just for their average probability to be arbitrarily small. This important result was proved using probability theory, 
a law of large numbers and volume-partitioning arguments, but no information theory. 

The scenarios considered in this section are similar to [14], with the chief difference being that that neither the 
initial state nor the erroneous channel are modelled stochastically here. As a consequence, probability and the law 
of large numbers cannot be employed in the analysis. Instead, maximin information is applied to yield necessary 
conditions that are then be shown to be tight (Thms . [STT] and |5 . 2| ) . Only state estimation without channel feedback is 
considered here, since the maximin-information theoretic analysis of systems with feedback is significantly different 
- see [16] for some preliminary results. 

In what follows, ||.|| denotes either the maximum norm on a finite-dimensional real vector space or the matrix 
norm it induces, and B/(x) denotes the corresponding /-ball {y : ||}^ — x|| < /} centered at x. 
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B. Disturbance-Free LTI Systems 

Consider an undisturbed linear time-invariant (LTI) system 



X{t + l) 
Y{t) 



= AX{t) Gl 
= GX{t) E. 



teZ 



>o, 



(29) 
(30) 



where the initial state X{0) is an uncertain variable (uv). The output signal Y is causally encoded via an operator 
7 as 

S{t) = r{t,Y{0:t)) eS, teZ>o. (31) 

Each symbol S{t) is then transmitted over a stationary memoryless uncertain channel with set-valued transition 
function S i-> 2*^ and input function space S°° (Definition 4.1), yielding a received symbol Q{t) G Q. Note that 
the encoder is told nothing about the values of these received symbols, i.e. there is no channel feedback. These 
symbols are used to produce a causal prediction X{t + l) of X{t + l) by means of another operator t] as 



X{t + l) = r^{t,Q{0:t))e: 



teZ>o, Xo = 0. 



(32) 



Let E{t) :=X{t) —X{t) denote the prediction error. 

The pair (7,77) is called a coder-estimator. Such a pair is said to yield p -exponential uniformly bounded errors 
if for any uv X(0) with range C B/(0), 



sup p ^\\E{t) 



: sup sup [p ^\\E{t) 



< 



where /,p > are specified parameters. If the stronger property 



lim sup p"^ 11^(0 II = lim sup |[p"^ 11^(0 111 =0 



(33) 



(34) 



holds, then p -exponential uniform convergence is said to be achieved. 
Impose the following assumptions: 
DFl: The pair (G,A) in (l29|-([30| is observable. 



DF2: 



For every t G Z>o, the channel output sequence Q{0 : t) (Definition |4.1[ ) is conditionally unrelated (Defini- 
tion [2^1]) with initial state X(0), given the channel input sequence 5(0 : t); i.e. X{0) ^ 5(0 : t) ^ Q{0 : t). 



DF3: The convergence parameter p of ([33|)-([34|) is strictly smaller than the spectral radius of A. 

Remarks: Condition DFl can be relaxed to requiring the observability of A on the invariant subspace correspond- 
ing to eigenvalues greater than or equal to p in magnitude. Assumption DF2 basically states that the channel outputs 
can depend on the initial state only via the channel inputs. Condition DF3 entails negligible loss of generality, since 
if p were to exceed the largest plant eigenvalue magnitude |Aniax|, then the trivial estimator X{t) =0 would achieve 
(34) and communication would not be needed 

The main result of this subsection is given below: 

Theorem 5.1: Consider the linear time-invariant system (29H30), with plant matrix A G R"^", uncertain initial 



state X{0) and outputs that are coded and estimated (31 )-(p2) without channel feedback, via a stationary memoryless 



uncertain channel (Definition |4.1| ) with zero-error capacity Co > ( [25] ). Let Ai, . . . , A„ be the eigenvalues of A and 
suppose that Assumptions DF1-DF3 hold. 



If there exists a coder-estimator that yields p -exponential uniformly bounded estimation errors ([33]) with respect 
to a nonempty /-ball B/(0) C of initial states, then 

A/ 



Co > i^g 

/G[l:«]:|A,-|>p 



(35) 



Conversely, if the inequality in ( [35] ) holds strictly, then a coder-estimator without channel feedback can be 
constructed to yield p -exponential uniform convergence ([34]) on any initial-state /-ball. 



^The case p = |Amax| introduces technicalities that can be handled by modifying to the arguments below; for the sake of conciseness it is 
not explicitly treated here. 
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1) Proof of Necessity: The necessity of ( [35] ) is established first. Without loss of generality, let the state coordinates 
be chosen so that A is in real Jordan canonical form (see e.g. [36], Theorem 3.4.5), i.e. it consists of m square 
blocks on its diagonal, with the jth block Ay G W^j^^j having either identical real eigenvalues or identical complex 
eigenvalues and conjugates for each j G [1 : m]. Let the blocks be ordered by descending eigenvalue magnitude. 
For any j G [1 : m], let Xj{t) G comprise those components of X{t) governed by the jth real Jordan block Aj, 
and let Ej{t)^Xj{t) G W-j consist of the corresponding components of E{t) and X{t), respectively. 

Let J G [0 : n] denote the number of eigenvalues with magnitude > p, including repeats. Pick arbitrary T G N and 

eG (0,1- max ) , (36) 



and then divide the interval [—/,/] on the /th axis into 

(l-e)A,- 



ki-.-- 



(37) 



P 

equal subintervals of length 21 /ki, for each / G [1 :d]. Denote the midpoints of the subintervals so formed by Pi{s), 
s = \,...,ki, and inside each subinterval construct an interval 1,(5) centred at pi{s) but of shorter length //A:,. Define 
a hypercuboid family 

^ - I (fr'^^'^) [-hlf^ ■■ e [1 : ki],i G [1 : (38) 

and observe that any two hypercuboids 6 are separated by a distance of l/kj along the rth axis for each ie[l:d]. 
Set the initial state range IX(0)] = Uhg^ ^ B/(0). 
As lEj{t)jDlEj{tMO:t-l)l 

diam[£'j(f)] > diam lEj{t)\q{0 : f - 1)] 

= diam lA'jXjiO) - r^j {t,q{0 :t-l)) \q{0 :?-!)] 

= diam [A^.XXO)k(0:f- 1)1 (39) 
sup ||A}(m-v)|| 



u,v€lXj(0)\q{0:t-l) 

> sup 

u,velXj(OMO:t-l) 



\\A){u- 



M'j) 



\u — v\\ 



CJmin(A})||w-v||2 

> sup ^—^ (40) 

u,velXj{OMO:t-i)j 

> sup 

u,velXj{OMO:t-l)j V^^ 

^ ^ ,,,, diam[X,-(0)|^(0:^-l)] 

^gZ>o, ^(0:/-l)G [2(0:^-1)1, (41) 

where diam(-) denotes set diameter under the maximum norm; ([39]) holds since translating a set in a normed space 
does not change its diameter; || • ||2 denotes Euclidean norm; and Gram{') denotes smallest singular value. 

Now, an asymptotic identity of Yamamoto states that lim^^oo ( CJniin(A^) j = |Aniin(Ay)|, where Aniin(-) denotes 
smallest-magnitude eigenvalue (see e.g. |[37ll , Thm 3.3.21). As there are only finitely many blocks Aj, 3t£ G Z>o 
s.t. ^ 

OminiA)) > (l - I) \Kin{Aj)\\ J G [1 I m], t > t^. (42) 

In addition, for any region K in a normed vector space, 

diam(K) = sup — v|| < sup ||w|| + ||v|| 

= 2sup||w||. (43) 
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By ( [33] ), there then exists (j) > such that 

0p^>supI||£(O||] 

> sup[||£X0lll?O.5diam[£X01 



1- 2j^in(^;) 
;G [1 t>te. 



diam[Xj(0)|^(0:f-l)] 



(44) 



For some T G N, the hypercuboid family J^f (38) is an [[X(0)|2(0 : T— l)]-overlap isolated partition (Definition 



3.1 ) of [[X(0)]. To see this, suppose in contradiction that 3H e that is overlap connected in [[X(0)|2(0 : T — 1)] 



with another hypercuboid in J^. Then there would exist a set [[X(0) : ^(0 : T — 1)] containing a point m G H and a 
point V in some H' eJ^\{H.}. Thus Mj,vy e [[Xy(0)|<?(0 : T- 1)1, implying 



|M;-V;i| < diam[Xj(0)|.?(0: T-1)] 



|(l-e/2)A„,i„(A,-)| 
je [1 :m], T>fe. 



(45) 



However, by construction any two hypercuboids G =^ are disjoint and separated by a distance of at least l/ki along 
the rth axis for each / e [1 : d]. Thus if Aj is the real Jordan block corresponding to some eigenvalue A,, / e [1 : d], 
then 

/ J37l / 



> 



k [{{\-Em/py\ 
I ip^ 



{{\-e)\mpy \{\-e)Kr.{Aj)\'' 
since all the eigenvalues of Aj have equal magnitudes. The RHS of this would exceed the RHS of (45) when 
T > max(?e,?') is sufficiently large that Y-e^ ) > Is/n^jl, yielding a contradiction. 
As is an |X(0)|e(0 : T- l)l-overlap isolated partition of |[X(0)] for sufficiently large T, 

2l,[X(0);e(0:r-l)] H | Jx(0) |2(0 : T - 1)]* | f |^| 



/=i 



/=i 



> 



no5 

/=i 

(l-e 



(l-e)A; 



|nf=iA,r 



2^p 



dx 



(46) 



(47) 



where (|46]) follows from ( |36| ) and the inequality [xj > x/2, for every x > 1. However, since X(0) ^ 5(0 : T — 1) ^ 
2(0 : T — 1) is a Markov uncertainty-chain (Definition |2.2|), 



Lem. 

I*[X(0);e(0 : T - 1)] < 145(0 : T - l);e(0 : T - 1)] 

Def. 1121 
ThmE] ^ 



Substituting this into the LHS of ( [47| ), taking logarithms, dividing by T and then letting T ^ 00 yields 

Log 

/=1 



Co> Jlog(l-e) + £log 



As 8 may be arbitrarily small, this establishes the necessity of (35). 
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2) Proof of Sufficiency: The sufficiency of ( |35[ ) is straightforward to estabhsh. Define new state and measurement 
vectors X\t) = p~^X{t) and Y'{t) = p~^Y{t), for every t G Z>o. In these new coordinates, the system equations 



(29)-p0b become 



X\t^\) = {A/p)X'{t) GR^ 
Y\t) = GX'(^) gR^, ^gZ; 



(48) 
(49) 

1 = 1 
(50) 



By (35) and (25), V5 G (0,Co -i/p) 3^^ > s.t. Vt > ^5, 3 a finite set F C with max^^-i^Q, |FnR(^5 
and 

i/p<Co-5<(log|F|)/T. 
Down-sample ([48])-([49]) by T to obtain the LTI system 

x'{{k+\)T) = {A/pyx'{kT) GR^ 

Y\kT) = GX\kT) gR^, yfcG Z>o. 
Now, |F| distinct codewords can be transmitted over the channel and decoded without error once every T samples. 
Furthermore log |F| > THp = sum of the unstable eigenvalue log-magnitudes of {A/py . By the "data rate theorem" 



(51) 
(52) 



(see e.g. [17]), there then exists a coder-estimator for the LTI down-sampled system (|51])-([52l) that estimates the 
states of ( [51] ) with errors ||X^(^t) —Xl\\ tending uniformly to 0. For every t G Z>o, write ^ = ^T + r for some k G Z>o 
and r G [0 : T — 1], and define an estimator 



X{t):=p^'^'A'Xl. 



Then 



p"^ m^\\X{t)-X{t) 
= p"(^'^+^) sup 



p^'A'X\kT)-p''A% 



kz A r<>f 



< p-'\\A'\\sup\\X\kz)-X;^\\ 

< max {p-'\\A'\\} sup\\X\kT)-X[\\^0 
as t, and hence k= [t/T\, tend to 00. 

C. LTI Systems with Disturbances 

The results and techniques of the previous subsection can be readily adapted to analyze systems with disturbances. 



Suppose that, instead of (|29])-([30]), the plant state and output equations are 

X{t + l) = AX{t)+V{t) GR^ 

Y{t) = GX{t)+W{t) gR^, ^gZ>o, 



(53) 
(54) 



where the uncertain signals V and W represent additive process and measurement noise. The objective is uniform 
boundedness, i.e. ( [33] ) with p = 1. Make the following assumptions: 
Dl: 



The plant dynamics ( |53| ) are strictly unstable, i.e. the matrix A has spectral radius strictly larger than 1. 
D2: The uncertain noise signals V and W are uniformly bounded, i.e. 3c > s.t. all possible signal realizations 

V G |yl and w G {W} have ^°°-norms ||v||, ||w|| < c. 
D3: The zero sequence is a possible process and measurement noise realization, i.e. G {V} H {W}. 
D4: The initial state ^(0), V and W are mutually unrelated (Definition 2.1). 

D5: For every t G Z>o, the channel output sequence 2(0 : t) (Definition 4.1 ) is conditionally unrelated (Defini- 

tion[2l]) with {X{0),V{0 : t - 1), W(0 : t)), given the channel input sequence 5(0 : t), i.e. {X{0),V{0 : t - 1), W(0 : t)) 

5(0:0 ^2(0:0- 
The following result holds: 

uncertain initial state 



Theorem 5.2: Consider a linear time-invariant plant ([53j)-([54|), with plant matrix A G 
X{0), and bounded uncertain signals V and W additively corrupting the dynamics and outputs respectively. Suppose 
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the plant outputs are coded and estimated ([3T])-([32|) without feedback via a stationary memoryless uncertain channel 



(Definition 4.1) having zero-error capacity Co > ([25|), and assume conditions DFl and D1-D5. 



If there exists a coder-estimator pT])-(|32l) yielding uniformly bounded estimation errors with respect to a 
nonempty /-ball B/(0) C of initial states, then 



Co> log|A/|=:i/, 



(55) 



ie[l:n]:\Xi\>l 



where Ai , . . . , A„ are the eigenvalues of A. 



Conversely, if ( [55] ) holds as a strict inequality, then a coder-estimator can be constructed to yield uniform 
boundedness for any given /-ball of initial states. 

Proof: Necessity is straightforward. If a coder-estimator achieves uniform boundedness, then this uniform bound 
is not exceeded if the uncertain disturbances are realized as the zero signal, which by hypothesis is an element 
of both |yl and {Wj. By unrelatedness p(0)|y = 0,W = 0] = |X(0)1, so the initial state range is unchanged. 
Furthermore, condition D5 implies X{0) ^ 5(0 : t) ^ Q{0 : t), i.e. condition DF2. As uniform boundedness is just 
p -exponential uniform boundedness with p = 1 ( [33]), Theorem |5. 1[ applies immediately to yield ( [55] ). 

The sufficiency of (55) is established next. By ([55[) and (25), V5 G (0,Co — i/) 3t§ > s.t. Vt > Z^, 3 a finite set 
^Q,|FnR(^5-i)| = l and 



F C with max z-i 

% 



H<Co-S<{log\F\)/T. 



Down-sample ([53])-([54]) by T to obtain the LTI system 

X{{k+l)T) = A^X\kT)+V^{k) GR^ 

Y{kT) = GX{kT)^W{kT) eRP, keZ>o, 



(56) 

(57) 
(58) 



where the accumulated noise term y/(^) := L/=o^^ ^ ^V{kT + i) can be shown to be uniformly bounded over 
k G Z>o for each r G [0 : T — 1]. Now, |F| distinct codewords can be transmitted over the channel and decoded 



without error once every T samples. Furthermore log|F| > tH = sum of the unstable eigenvalue log-magnitudes 
of A^. By the "data rate theorem" for LTI systems with bounded disturbances controlled or estimated over errorless 
channels, (see e.g. |[T7]| . |[2T[| . |[T9]| ), there then exists a coder-estimator for the LTI down-sampled system ([53])-([54]) 
that estimates its states with errors X{kT) —X^ uniformly bounded over k G Z>o. 

For every t G Z>o, write t = kt + r for some k G Z>o and r G [0 : T — 1], and define an estimator 



Then 



X{t):=A%. 



sup \\X{t)-X{t)\\ 
= sup \\A'X{kT)+V^{k)-A' 

< ||A'-||sup||X(/:T)-X,|| + ||V;(/:)|| 

< max m\\}sup\\X{kT)-Xk\\+ max \\V;{k)l 

re[l:T-l] coeQ. re[l:T-l] 



As the RHS is uniformly bounded over k G Z>o, the proof is complete. □ 



D. Discussion 

Like the results of Matveev and Savkin [T4] on LTI state estimation via an erroneous channel without feedback, 
Thms. [STT] and [5. 2| involve the zero-error capacity of the channel. In their formulation, the process and measurement 
noise are treated as bounded unknown deterministic signals, but the channel and initial state are modelled proba- 
bilistically. The estimation objective is to achieve estimation errors that, with probability (w.p.) 1, are uniformly 
bounded over all admissible disturbances, and the necessity part of their result was proved with the aid of a law 
of large numbers. 
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The main aims of this section have been to demonstrate firstly, that statistical assumptions are not necessary to 
capture the essence of this problem (modulo zero-probability events); and secondly, that even with no probabilistic 
structure to exploit, information-theoretic techniques can be successfully applied, based on I*. Although the channel 
and initial state here are modelled nonstochastically and, furthermore, the estimation errors are to be bounded 
uniformly over all samples 0) G fl, not just w.p.l, the achievability criterion ( [55] ) of subsection |V-C| essentially 
recovers the earlier resultQ 

In addition, unlike [14] and Theorem 5.2 Theorem 5.1 assumes no disturbances and concerns performance as 
measured by a specific convergence rate, not just bounded errors. The criterion ( [35] ) agrees with |[T4ll when p = 1, 
but is more (less) stringent when p < (>)1. It applies when, for instance, the states of a possibly stable noiseless 
LTI plant are to be remotely estimated with errors decaying at or faster than a specified speed p^ 



VI. Conclusion 

In this paper a formal framework for modelling nonstochastic variables was proposed, leading to analogues 
of probabilistic ideas such as independence and Markov chains. Using this framework, the concept of maximin 
information was introduced, and it was proved that the zero-error capacity Cq of a stationary memoryless uncertain 
channel coincides with the highest rate of maximin information across it. Finally, maximin information was applied 
to the problem of reconstructing the states of a linear time-invariant (LTI) system via such a channel. Tight 
criteria involving Co were found for the achievability of uniformly bounded and uniformly exponentially converging 
estimation errors, without any statistical assumptions. 

An open question is whether maximin information can be used in the presence of feedback. Two challenges 
present themselves. Firstly, the equivalence between the problems of state estimation and control in the errorless 
case is lost if channel errors occur, because the encoder does not necessarily know what the decoder received. 
Secondly, from fTT], f35\ it is known that for both the problems of LTI state estimation with channel feedback and 
LTI control, the relevant channel figure-of-merit for achieving a.s. bounded estimation errors or states respectively 
is its zero-error feedback capacity Qf, which can be strictly larger than Co ifTTTl . 

These issues suggest that nontrivial modifications of the techniques presented here may be required to study 
feedback systems. Preliminary results concerning this problem will be presented in the upcoming conference paper 
CH. 
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Appendix A 
Proof of Lemma[221 

By it need only be established that the RHS of ^ is contained in its LHS. Pick any realization (yi, . . . ,ym) ^ 
[Fi, . . . and consider any element x in the RHS of 

Pick any (3^1, . ,3^m) ^ [^h • • -^Ym} and any point x G the RHS. For every / G [1, . . . ,^], 3a)/ G s.t. X{cOi) =x and 
Yi{cOi) = yt, so that yi G By the conditional unrelatedness of Fi, ... ,1^ given X, it follows that (}^i, . . . ^ym) ^ 

|Fi,...,F^|Xl. That is, 3co e with X{co) =x and Yi{co) =yu for each / G [l,...,m]. Thus x G lX\yi,. . . ,yyn\, 
implying that the the RHS of ^ is contained in the LHS. By ([6]), the LHS is also contained in the RHS, establishing 
equality. 

Appendix B 

Proof of Lemma [3TT] (Unique Overlap Partition) 

The first step is to establish the existence of an overlap partition. For any x G let 0(x) be the set of all 
points in \X\ with which x is overlap connected. Obviously := {0(x) : x G is an |X] -cover. Any two points 
in 0(x) are overlap connected, since they are both overlap connected with x. Furthermore, if any two sets 0(x) 
and O(x^) have some point w in common, then they must coincide, since x y and x^ y imply that x x^ 
Moreover, if 0(x) and 0(z) are distinct, hence disjoint, then they are overlap isolated; otherwise some point v 
would be overlap connected with both x and z and thus lie in 0(x) nO(z), which is impossible. Thus the family 

is an overlap partition. 
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To prove that it is unique, let be any overlap partition. Then every set in & must be contained in 0(x), 
for each x G 0^ However, 0(x) must also be included in 0^ Otherwise there would be a point q outside that is 
overlap connected with x\ this q would have to lie in some set Q G <^^\ {O}, impossible since Q must be overlap 
isolated from O'. Thus = 0(x) for each x G O', and so = {0(x)} = G. 



To estabhsh ([14]), for any C G <^ let D := {3; G [F] : \^\y\ ^ C}. As each element of G consists of all the 
points it is overlap connected with, it follows that \^\y\ C C, for each 3^ G D. Furthermore and C are overlap 

isolated and thus have null intersection, for every y' G [F] \ D. Thus 

c = U cnB 

U cnB 

= U B. 

Bg|X|F]:B<^C 



To prove (15), observe that every set C G p|F]* intersects exactly one set Pc G i.e. Pc 3 C. Otherwise, C 
would also overlap some other set P^ 7^ Pc in the partition since C is overlap-connected, this would imply that 
there is a point in Pc and one in P^ that are overlap-connected, which is impossible since is an overlap-isolated 
partition. Furthermore, since [X|F].„is a cover of p], every set in must intersect some set in it. Thus C ^ Pc 
is a surjection from p|Fl* ^ and so \\^\Y\^\ > \^\. 

To prove the equality condition, observe that VP G 

p = J cnp 
U c. 

CG|X|F]*:CnP7^0 

If ||X|F]*| = \^\, then C ^ Pc is a bijection from [X|F]* ^ and so the union above can only run over one 
set C. Consequently Pc = C, i.e. the bijection C ^ Pc from ^ =^ is an identity. 



Appendix C 

Proof of Lemma [372] (Taxicab- <^ Overlap-Connectedness) 

With regard to the first statement, note that if {x^y)^{x\y') G |^,F] are taxicab connected, then there is a taxicab 
sequence 

(x,3;i), (X2,3;i), (X2, 3^2), (-^3,3^2), . . . , {Xn-l,yn-l)A^' ^yn-l) 

of points in |X,F]. This yields a sequence of conditional ranges s.t. x/ G \^\yi^ H |X|3;/_i] 7^ for each 

/ G [2, . . . - 1], with X G \X\yi\ and x' G Thus x ^ x' . 

To prove the reverse implication, suppose that x <^ x' and pick any 3; G |F|x] and y' G [F|x^]. Then 3 a sequence 
{Pb/llLi conditional ranges s.t. \X\yi\ H / 0, for each / G [2, . . . ,fz], where yi=y and yn = For 

every / G [2, . . . pick an x/ G \X\yi\ H p|3;/_i]. Then the taxicab sequence 

(x,};i), (X2,yi), (X2,3;2), fe,3^2), . . . , {Xn,yn)A^' ^yn) 

comprises points in |X,F]. Thus (x,};), (x^,/) are taxicab connected in |X,F]. 

To prove the forward implication of the 2nd statement, note that if any {x^y) G A is taxicab connected with any 
(x^,y) G A, then x,x^ G A+ are overlap connected. Similarly, if every x,x^ G A+ are overlap connected then for 
each 3; G [F|x] and / G [F|x^], {x^y) is taxicab connected with (x^,/). The statement then follows by noting that 

The 3rd statement ensues similarly. If every {x^y) G A is taxicab disconnected from any {x' ^y') G B, then every 
X G A+ is overlap disconnected from any x^ G B+. 

Similarly, if every x G A+ is overlap disconnected from any x^ G B+, then My G [F|x] and / G [F|x^], {x^y) 
is taxicab disconnected from {x\y'). The proof is completed by noting that A C U};g|f|jc],jcgA+{(-^5}^)} ^ ^ 
\Jy'eiY\x%x'eK+ {(x',y)}. 
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Appendix D 

Proof of Theorem |3 . 1 1 (Unique Taxicab Partition) 

For any set C in the unique overlap partition |X|F]>„, define C~ := UjgIfIjc] jcgcII-^?)^)} ^ 1^5 ^1 the cover 
^-:={C-:CG[X|r]4of 

By Lemma |3.2[ the sets of are individually taxicab connected and mutually taxicab isolated, so is a 
taxicab partition. 

To establish uniqueness, note that if ^ is any taxicab partition, then by the same token its projection is an 
overlap partition, which by uniqueness must coincide with Thus VP G 

yG|F|jc],JcGP+ 

i.e. every set in ^ is inside a single set in As ^ and ^~ are partitions of it follows then that P must 

coincide exactly with an element of 



To prove ( 17), first observe that every set D G ^[X;Y] intersects exactly one set Qd G i.e. Qd 3 D. Otherwise, 
D would also intersect some other set / Qd in the partition since D is taxicab-connected, this would imply 
that there is a point in Qd and one in Q^ that are taxicab-connected, which is impossible since is a taxicab- 
isolated partition. Furthermore, since ^[X;F] is a cover, every set in ^ must intersect some set in it. Thus D Qd 
is a surjection from ^[X;F] ^ and so |^[X;F]| > |^|. 

Appendix E 
Proof of Lemma BTT] 

Pick any 3/(0 : t) G Y^+^ As \Xi\y^ ^ Vi{y,) for each / G [0, . . . ,t\ it follows that [X(0 : /) |3;(0 : t)\ C nUK'b/l 



C n/=ol^(>'0- Moreover |X(0 : t)\y{Q : ^)1 C |X(0 : thus establishing that the LHS of ([22]) is contained in the 
RHS. 

It is now shown that the RHS is contained in the LHS, proving equality. If the RHS is empty then so is the 
LHS, by the preceding argument, yielding the desired equality. If the RHS is not empty, pick an arbitrary element 
x(0 : t) in it, i.e. x(0 : t) G |X(0 : ^1 and x{i) G R(3^(/)), for each / G [0,...,^]. By ([fl]), y{i) G T(x(/)) for each 

/ G [0, . . .,t\ or equivalent^ : t) G n/=oT(-^/) - [^"(0 : t)\x{^ : t)\ Thus 3co G s.t. 7(0 : t){()S) = 3/(0 : t) and 
X(0 : t\(£i) =x(0 : t). This implies that x(0 : t) G [X(0 : t)\y{^ : t)\ Thus the RHS of ([22]) is contained in the LHS, 
completing the proof. 
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